Skip to content

EliasFano.Build: less iterations in upperBits loop#19556

Merged
Giulio2002 merged 4 commits intomainfrom
alex/ef_build_optimize_34
Mar 2, 2026
Merged

EliasFano.Build: less iterations in upperBits loop#19556
Giulio2002 merged 4 commits intomainfrom
alex/ef_build_optimize_34

Conversation

@AskAlexSharov
Copy link
Collaborator

Idea: instead of iterating over bits, iterate over words and get first-non-zero bit in word

Details: all Build() functions use naive O(64) bit scanning per word. Replace with bits.TrailingZeros64 + word &= word-1 to iterate only over set bits directly. This compiles to a single TZCNT instruction and eliminates all the skipped-zero-bit iterations.

Result: 2.75x speedup. Also speedup scales with sparsity of upperBits — at ~34% density, the inner loop runs ~3x fewer iterations

cpu: AMD EPYC 4344P 8-Core Processor
100 elements sequence: 233.0 ns/op -> 81.98 ns/op
1M elements sequence: 2344235 ns/op -> 824938 ns/op

@AskAlexSharov AskAlexSharov marked this pull request as ready for review March 2, 2026 05:08
@AskAlexSharov AskAlexSharov changed the title EliasFano: Build() optimize EliasFano.Build: less iterations in upperBits loop Mar 2, 2026
@Giulio2002 Giulio2002 merged commit 0ec4079 into main Mar 2, 2026
25 checks passed
@Giulio2002 Giulio2002 deleted the alex/ef_build_optimize_34 branch March 2, 2026 19:02
sudeepdino008 pushed a commit that referenced this pull request Mar 4, 2026
Idea: instead of iterating over bits, iterate over words and get
first-non-zero bit in word

Details: all Build() functions use naive O(64) bit scanning per word.
Replace with `bits.TrailingZeros64 + word &= word-1` to iterate only
over set bits directly. This compiles to a single `TZCNT` instruction
and eliminates all the skipped-zero-bit iterations.

Result: `2.75x` speedup. Also speedup scales with sparsity of upperBits
— at ~34% density, the inner loop runs ~3x fewer iterations


```
cpu: AMD EPYC 4344P 8-Core Processor
100 elements sequence: 233.0 ns/op -> 81.98 ns/op
1M elements sequence: 2344235 ns/op -> 824938 ns/op
```
sudeepdino008 pushed a commit that referenced this pull request Mar 4, 2026
Idea: instead of iterating over bits, iterate over words and get
first-non-zero bit in word

Details: all Build() functions use naive O(64) bit scanning per word.
Replace with `bits.TrailingZeros64 + word &= word-1` to iterate only
over set bits directly. This compiles to a single `TZCNT` instruction
and eliminates all the skipped-zero-bit iterations.

Result: `2.75x` speedup. Also speedup scales with sparsity of upperBits
— at ~34% density, the inner loop runs ~3x fewer iterations


```
cpu: AMD EPYC 4344P 8-Core Processor
100 elements sequence: 233.0 ns/op -> 81.98 ns/op
1M elements sequence: 2344235 ns/op -> 824938 ns/op
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants